3 research outputs found
Welfare Diplomacy: Benchmarking Language Model Cooperation
The growing capabilities and increasingly widespread deployment of AI systems
necessitate robust benchmarks for measuring their cooperative capabilities.
Unfortunately, most multi-agent benchmarks are either zero-sum or purely
cooperative, providing limited opportunities for such measurements. We
introduce a general-sum variant of the zero-sum board game Diplomacy -- called
Welfare Diplomacy -- in which players must balance investing in military
conquest and domestic welfare. We argue that Welfare Diplomacy facilitates both
a clearer assessment of and stronger training incentives for cooperative
capabilities. Our contributions are: (1) proposing the Welfare Diplomacy rules
and implementing them via an open-source Diplomacy engine; (2) constructing
baseline agents using zero-shot prompted language models; and (3) conducting
experiments where we find that baselines using state-of-the-art models attain
high social welfare but are exploitable. Our work aims to promote societal
safety by aiding researchers in developing and assessing multi-agent AI
systems. Code to evaluate Welfare Diplomacy and reproduce our experiments is
available at https://github.com/mukobi/welfare-diplomacy
SuperHF: Supervised Iterative Learning from Human Feedback
While large language models demonstrate remarkable capabilities, they often
present challenges in terms of safety, alignment with human values, and
stability during training. Here, we focus on two prevalent methods used to
align these models, Supervised Fine-Tuning (SFT) and Reinforcement Learning
from Human Feedback (RLHF). SFT is simple and robust, powering a host of
open-source models, while RLHF is a more sophisticated method used in top-tier
models like ChatGPT but also suffers from instability and susceptibility to
reward hacking. We propose a novel approach, Supervised Iterative Learning from
Human Feedback (SuperHF), which seeks to leverage the strengths of both
methods. Our hypothesis is two-fold: that the reward model used in RLHF is
critical for efficient data use and model generalization and that the use of
Proximal Policy Optimization (PPO) in RLHF may not be necessary and could
contribute to instability issues. SuperHF replaces PPO with a simple supervised
loss and a Kullback-Leibler (KL) divergence prior. It creates its own training
data by repeatedly sampling a batch of model outputs and filtering them through
the reward model in an online learning regime. We then break down the reward
optimization problem into three components: robustly optimizing the training
rewards themselves, preventing reward hacking-exploitation of the reward model
that degrades model performance-as measured by a novel METEOR similarity
metric, and maintaining good performance on downstream evaluations. Our
experimental results show SuperHF exceeds PPO-based RLHF on the training
objective, easily and favorably trades off high reward with low reward hacking,
improves downstream calibration, and performs the same on our GPT-4 based
qualitative evaluation scheme all the while being significantly simpler to
implement, highlighting SuperHF's potential as a competitive language model
alignment technique.Comment: Accepted to the Socially Responsible Language Modelling Research
(SoLaR) workshop at NeurIPS 202
Opportunities in Physics Education: Low-Cost Position Tracking for Use in Kinematics Labs
Traditional introductory physics kinematics laboratories utilized a few different instruments for locating objects in motion, all of which have shortcomings. Some provide only timing data, which heavily restricts trajectories and data collection. Some instruments provide more measurements but restrict object shapes, orientations, and textures. Still others require extensive pre-processing. None of these traditional instruments provide two- or three-dimensional position data. New, low-cost, local positioning technology, based on radio frequency wireless communications, is available that enables novel redesigns of physics laboratories. This technology provides two- and three-dimensional position measurements, continuously, at data rates of 10 Hz or faster, from any object to which it can be affixed. Our research group at Portland State University is exploring how this technology can be applied to reconstruct and improve introductory laboratories, making them easier to perform while increasing the amount of usable data gathered. Additionally, we seek to enhance model-based learning experience in labs by confronting students with more diverse models than traditionally encountered. For example, we are pursuing applications in free-fall experiments, aerodynamic friction, two-dimensional motion, two-dimensional collisions, tug-of-war competitions, as well as Astronomy applications such as retrograde motion